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3D motion tracking is a critical task in many computer vision applications. Unsupervised marker- 
less 3D motion tracking systems determine the most relevant object in the screen and then track 
it by continuously estimating its projection features (center and area) from the edge image and a 
point inside the relevant object projection (namely, inner point), until the tracking fails. Existing 
reliable object projection feature estimation techniques are based on ray-casting or grid-filling 
from the inner point. These techniques assume the edge image to be accurate. However, in real 
case scenarios, edge miscalculations may arise from low contrast between the target object and 
its surroundings or motion blur caused by low frame rates or fast moving target objects. In this 
paper, we propose a barrier extension to casting-based techniques that mitigates the effect of edge 
miscalculations . 
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I. INTRODUCTION 

Optical motion tracking, simply called motion track- 
ing in this paper, means continuously locating a moving 
object in a video sequence. 2D tracking aims at following 
the image projection of objects that move within a 3D 
space. 3D tracking aims at estimating all six degrees of 
freedom (DOFs) movements of an object relative to the 
camera: the three position DOFs and the three orienta- 
tion DOFs [ll|. 

A 3D motion tracking technique that only estimates 
the three position DOFs (namely moving up and down, 
moving left and right, and moving forward and back- 
ward) is enough to provide a three-dimensional cursor- 
like input device driver [1J, [l3[ . 

Such an input device could be used as a standard 2D 
mouse-like pointing device that considers depth changes 
to cause mouse-like clicks. It also settles the bases for the 
development of virtual device drivers (i.e. software im- 
plemented device drivers, or not hardware device drivers) 
that consider three-dimensional position coordinates. 

Real-time 3D motion tracking techniques have direct 
applications in several huge niche market areas [16| : the 
surveillance industry, which benefits from motion detec- 
tion and tracking 0, H| LLI| ; the leisure industry, which 
benefits from novel human-computer interaction tech- 
niques [7l.ll5|: the medical and military industries, which 
benefit from perceptual interfaces [2| ^augmented reality 
@, and object detection and tracking [l|,|j,|a|; and the au- 
tomotive industry, which benefits from driver assistance 
systems [3|. 

A 3D motion tracking system that only requires a sin- 
gle low-budget camera can be implemented in a wide 
spectrum of computers and smartphones that already 
have such a capture device installed. 

There exist unsupervised markerless 3D motion track- 
ing techniques |12h14| that need no training, calibration, 
nor knowledge on the target object, and only require a 
single low-budget camera and an evenly colored object 



that is distinguishable from its surroundings. 

These motion tracking techniques consist of a sub- 
system that determines the most relevant object in the 
screen, and a subsystem that performs the tracking by 
continuously estimating the target object projection fea- 
tures (center and area) from the edge image and a point 
inner to the object projection. 

Existing object projection feature estimation tech- 
niques perform ray-casting [12L Il3( ] or grid-filling [14J from 
the inner point and estimate the center as the average of 
the ray hit location positions and the area as the coverage 
of the rays. 

These techniques assume the edge image to be accu- 
rate. However, in real case scenarios, edge miscalcula- 
tions may arise from low contrast between the target ob- 
ject and its surroundings or motion blur caused by low 
frame rates or fast moving target objects. 

In this paper, we propose a barrier extension to 
casting-based techniques that mitigates the effect of edge 
miscalculations. 

Section HTl covers the definition of the object projection 
feature estimation problem and the existing techniques 
for solving it. Section Hill describes our barrier extension 
to casting-based techniques. Finally, Section 6 summa- 
rizes the obtained conclusions and discloses the future 
work that derives from our research. 



II. BACKGROUND 

Unsupervised markerless 3D motion tracking tech- 
niques requires estimating the centroid and the area of 
the projection of a target object given an edge image 
and a poin t inside the object projection (namely, inner 
point) (l2l. Il3|. The inner point also has to be updated 
to increase the probabilities of it being inside the object 
projection in the next frame. We call this the object 
projection feature estimation problem. 

Figure [TJ depicts examples of a convex object projec- 



tion feature estimation problem and a non-convex object 
projection feature estimation problem. 




Figure [2] illustrates 32-ray-casting being applied to a 
convex object projection and to a non-convex object pro- 
jection. 



Figure 1 The object projection feature estimation problem 
consists in, given an edge image and a point inside the object 
projection (namely, inner point), estimating the object pro- 
jection centroid, the object projection area, and updating the 
inner point in order to increase the probabilities of it being 
inside the object projection in the next frame. Example of a 
convex object projection feature estimation problem (sphere 
projection) and to a non-convex object projection feature es- 
timation problem (hand projection). 

It should be noted that the inner point can be found 
enclosed in a small isolated area (e.g. a finger, when the 
target object is a hand). 

It also should be noted that, due to the object move- 
ment between frames, it is possible for the current inner 
point to be relocated at a position that will be outside 
the object projection in the next frame. 

Each one of the following subsections describes an ap- 
proach for solving the object projection feature estima- 
tion problem. 




Figure 2 32-ray-casting being applied to the estimation of the 
features of a convex object projection (sphere projection) and 
to a non-convex object projection (hand projection). 

The main drawback of this technique is that the esti- 
mations may not be accurate when it is applied to non- 
convex object projections (e.g. a hand projection). In 
that case, the ray hit locations might be representative 
of just a fragment of the projection, in particular when 
the inner point is in a small isolated area of the object 
projection. The centroid and the area might be inaccu- 
rately estimated, and the estimations may greatly vary 
depending on the position of the inner point relative to 
the object projection and on the ray orientations. 

The likeliness of edge miscalculations (i.e. the edges 
not being calculated correctly) to have high impact in 
the projection area and centroid estimations is inversely 
proportional to n. 



A. Feature Estimation Based on n-Ray-Casting 

Using this technique, n rays are casted from the inner 
point position in different directions to hit an edge in the 
edge image pjl, [l3| . 

The new centroid position is estimated to be the aver- 
age of the ray hit location positions. 

In order to estimate the inner point, it is displaced 
towards the new centroid until it reaches it or an edge. 
Then, rays are casted from the inner point and it is re- 
located at the average of the ray hit location positions, 
in order to center it in the projection area it is located, 
which reduces the probability of it being outside the ob- 
ject projection in the next frame. 

The area is estimated to be the sum of the lengths of 
the casted rays. 



B. Feature Estimation Based on Iterative n-Ray-Casting 

This technique is an iterative extension to n-ray-casting 

Using this technique, n rays are casted from the inner 
point position in different directions to hit an edge in the 
edge image. 

The new centroid position is estimated to be the aver- 
age of the last iteration ray hit location positions. 

The inner point is displaced towards the new centroid 
until it reaches it or an edge. 

The process is repeated until the centroid and inner 
point adjustment is negligible or up to a maximum num- 
ber of iterations. 

Then, rays are casted from the inner point and it is 
relocated at the average of the ray hit location positions, 



in order to center it in the projection area it is located, 
which reduces the probability of it being outside the ob- 
ject projection in the next frame. 

The area is estimated to be the sum of the rays casted 
during the last iteration. 

Figure [3] illustrates two steps of iterative 32-ray-casting 
being applied to a convex object projection and to a non- 
convex object projection. 





Figure 3 Two steps of iterative 32-ray-casting being applied 
to the estimation of the features of a convex object projec- 
tion (sphere projection) and to a non-convex object projection 
(hand projection). Images on the left show the first iteration. 
Images on the right show the second iteration. 



It should be noted that iterative n-ray-casting can re- 
locate the inner point into wider areas and therefore pro- 
duce better estimations of the object projection centroid 
and area. Indeed, it can be observed that it produces 
better results than n-ray-casting when the target object 
is non-convex and the inner point is in a small isolated 
area of the target object projection. 

Although this technique being iterative makes the cen- 
troid tend to be relocated into wider areas, the estima- 
tions are still not accurate when the technique is applied 
to non-convex object projections, as the ray hit locations 
might be representative of just a fragment of the object 
projection. 

It should be noted that the centroid is not guaran- 
teed to converge, and the estimations may greatly vary 
depending on the position of the inner point relative to 
the object projection, on the ray orientations, and on the 
maximum number of iterations. 

The likeliness of edge miscalculations to have high im- 
pact in the projection area and centroid estimations is 
inversely proportional to n. 



C. Feature Estimation Based on Iterative n^-Ray-Casting 

This technique is an extension to iterative n-ray-casting 

mp. 

Using this technique, n rays are casted from the inner 
point position in different directions to hit an edge in the 
edge image. 

Then, n rays are casted from each of the last iteration 
ray hit location position. This re-casting process is re- 
peated y times for a total of n v rays being casted in the 
latest iteration. 

The new centroid position is estimated to be the aver- 
age of the last iteration ray hit location positions. 

The inner point is displaced towards the new centroid 
until it reaches it or an edge. 

Then, rays arc casted from the inner point and it is 
relocated at the average of the ray hit location positions, 
in order to center it in the projection area it is located, 
which reduces the probability of it being outside the ob- 
ject projection in the next frame. 

The process is repeated until the centroid and inner 
point adjustment is negligible or up to a maximum num- 
ber of iterations. 

The area is estimated to be the sum of the rays casted 
during the last iteration. 

Figure [4] illustrates 16 2 -ray-casting being applied to 
a convex object projection and to a non-convex object 
projection. 





Figure 4 Two steps of iterative 16 2 -ray-casting being applied 
to the estimation of the features of a convex object projec- 
tion (sphere projection) and to a non-convex object projection 
(hand projection). Images on the left show the first iteration. 
Images on the right show the second iteration. 



It should be noted that the inner point is relocated into 
wider areas in non-convex object projections very slowly, 
due to isolated areas near the current inner point having 
a higher ray-density than wider areas, rendering the later 
less relevant for the estimation of the projection centroid 



and area. On the other hand, iterative n^-ray-casting 
covers the projection better than iterative n-ray-casting, 
and therefore outperforms it. 

It should be noted that this technique, as n-ray- 
casting, does not guarantee the centroid to converge, and 
results may still greatly vary depending on the position 
of the inner point relative to the object projection, on 
the ray orientations, and on the maximum number of 
iterations. 

The likeliness of edge miscalculations to have high im- 
pact in the projection area and centroid estimations is 
inversely proportional n y . It should be noted that edge 
miscalculations near the inner point may produce very 
inaccurate results. 



D. Feature Estimation Based on Iterative n H -Ray-Casting 
with m-Rasterization 

This technique is an extension to iterative ?; y -ray-casting 
[Fa . LL3| that solves its problems. 

Using this technique, n rays are casted from the inner 
point position in different directions to hit an edge in the 
edge image. 

Then, n rays are casted from each of the last iteration 
ray hit location position. This re-casting process is re- 
peated y times for a total of n v rays being casted in the 
latest iteration. 

Now, a rasterization process takes place. Every mxm 
block that was run through by any of the rays is selected. 

The new centroid position is estimated to be the aver- 
age of the selected block positions. 

The inner point is displaced towards the new centroid 
until it reaches it or an edge. 

Then, rays are casted from the inner point and it is 
relocated at the average of the ray hit location positions, 
in order to center it in the projection area it is located, 
which reduces the probability of it being outside the ob- 
ject projection in the next frame. 

The process is repeated until the centroid and inner 
point adjustment is negligible or up to a maximum num- 
ber of iterations. It should be noted that, as blocks 
always represent areas inside the object projection, no 
blocks are unselected between iterations. 

The area is estimated to be the sum of the selected 
block areas. 

Figure [5] illustrates 16 2 -ray-casting with 8-rasterization 
being applied to a convex object projection and to a non- 
convex object projection. 

It should be noted that the inner point moves to wider 
areas in non-convex object projections quicker than when 
applying iterative n y -ray-casting, due to high-ray-density 
areas being given the same relevance as low-ray-density 
areas. Less iterations are necessary for the estimations 
to be accurate, therefore processing times are lower than 
those of iterative n^-ray-casting without rasterization, al- 
though they may still be prohibitive for certain applica- 
tions. It also should be noted that when m is too high, 



the projection centroid and area estimations will be im- 
precise due to low resolution in block selection; when 
to is too low, the technique behaves as iterative 16 2 - 
ray-casting without rasterization, which makes the inner 
point to be slowly displaced . 
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Figure 5 Two steps of iterative 16 -ray-casting with 8- 
rasterization being applied to the estimation of the features 
of a convex object projection (sphere projection) and to a 
non-convex object projection (hand projection). Images on 
the left show the first iteration. Images on the right show the 
second iteration. 



As the selected blocks are kept between iterations, the 
inner point and the centroid are guaranteed to converge. 
Although results may vary depending on the position of 
the inner point relative to the object projection, on the 
ray orientations, and on the maximum number of itera- 
tions, they will be similar for convex object projections 
and non-convex object projections with not too large iso- 
lated areas. 

The likeliness of edge miscalculations to have high im- 
pact in the projection area and centroid estimations is 
inversely proportional to n y ■ i, being i the number of 
performed iterations, as the final estimations depends on 
rays casted during any iteration. 

Filling-based techniques consist in locating every piece 
of the object projection until reaching the edges, and 
therefore fit the object projection better than ray- 
casting-based techniques. Depending on the resolution 
and the parameters, filling-based techniques may also re- 
quire a lower processing time than some of the more com- 
plex iterative ray-casting-based techniques. It should be 
noted that, when using filling-based techniques, the fea- 
ture estimations calculated from different inner points 
located in different parts of the same object will be the 
same. 



E. Feature Estimation Based on Pixel-Filling 



F. Feature Estimation Based on m-Grid-Casting 



This technique is an approach to the object projection 
feature estimation problem that requires a lower process- 
ing time than ray-casting-based techniques. 

Using this technique, the object projection is covered 
by filling it up to the edges. 

A pixel queue is initialized with the inner point po- 
sition pixel. While the queue contains pixels, a pixel if 
extracted from the queue. If the pixel is an edge in the 
edge image, it is ignored. If the pixel is not an edge, it is 
marked as part of the object projection and all the pixels 
next to it that have not been marked are added to the 
queue. 

The new centroid position is estimated to be the aver- 
age of the marked pixel positions. 

The inner point is displaced towards the new centroid 
until it reaches it or an edge. Then, rays are casted from 
the inner point and it is relocated at the average of the 
ray hit locations, in order to center it in the projection 
area it is located, which reduces tracking errors. 

The area is estimated to be the number of the marked 
pixels. 

Figure |5] illustrates pixel- filling being applied to a con- 
vex and a non-convex object projection. 



This technique is an approach to the object projection 
feature estimation problem that solves the aforemen- 
tioned pixel-filling-based technique problem. 

Using this technique, a grid consisting of mxm cells is 
casted centered in the inner point position and expanded 
until it reaches the edges. 

A pixel queue is initialized with the inner point po- 
sition pixel. While the queue contains pixels, a pixel if 
extracted from the queue. If the pixel is an edge in the 
edge image, it is ignored. If the pixel is not an edge, it is 
marked as part of the object projection and all the pixels 
next to it that would be in the grid and that have not 
been marked are added to the queue. 

The new centroid position is estimated to be the aver- 
age of the grid pixel positions. 

The inner point is displaced towards the new centroid 
until it reaches it or an edge. Then, rays are casted from 
the inner point and it is relocated at the average of the 
ray hit locations, in order to center it in the projection 
area it is located, which reduces tracking errors. 

The area is estimated to be the number of the grid 
pixel positions. 

Figure [7] illustrates grid-casting being applied to a con- 
vex and a non-convex object projection. 




Figure 6 Pixel-filling being applied to the estimation of the 
features of a convex object projection (sphere projection) and 
to a non-convex object projection (hand projection). 




Figure 7 8-grid-casting being applied to the estimation of the 
features of a convex object projection (sphere projection) and 
to a non-convex object projection (hand projection). 



It should be noted that this technique does not need 
to be iteratively applied, and the obtained results are the 
same independently of the inner point position. 

However, this technique presents a major drawback 
that renders it unusable: edge miscalculations have high 
impact in the projection area and centroid estimations, 
as a single-pixel edge miscalculation would allow the fill- 
ing to expand out of the actual object projection. 



It should be noted that this approach solves all the 
problems of the aforementioned techniques: it is not 
as processing-time intensive as iterative n-ray-casting; 
it produces similar results independently of the inner 
point position; it allows the estimation of features of non- 
convex objects; it makes non-convex zones to be as rel- 
evant as convex-zones, as the grid pixel density is the 
same in the whole object projection; and the likeliness 
of edge miscalculations to have high impact in the pro- 



jection area and centroid estimations is not as high as 
filling-based techniques'. 

Also, as the grid size can be configured, the process- 
ing time requirements can be adjusted for low budget 
processor devices such as smartphones. 



III. USING BARRIERS IN CASTING TECHNIQUES 

The studied object projection feature estimation tech- 
niques are sensitive to edge miscalculations in different 
degrees. A single edge miscalculation (e.g. an edge pixel 
not marked as edge) may cause ray-casting, pixel-filling 
and grid-filling to spread outside of the object projection 
and significantly alter the feature estimations, causing 
failures in the motion tracking. 

Each time a target pixel is visited in ray-casting- or 
grid-filling-based techniques, it is accessed from a source 
neighboring pixel. The vector determined by the source 
and target pixel positions provides context information 
that can be exploited to enhance the detection of edge 
collisions, thus allowing the enhancement of the existing 
object projection feature estimation techniques. 

Existing techniques check if a pixel that is to be vis- 
ited is an edge pixel in the edge image, in which case 
it is discarded. It should be noted that, in this case, a 
pixel-wide edge miscalculation is enough to cause very 
inaccurate results, as the casting can progress after the 
undetected edge and spread to other object projections, 
as seen in Figure [5] 



visited 


visited 


visited 


edge 






visited 






edge 






visited 






edge 






visited 






edge 






visited 


visited 




^3ge 
miss 






visited 


visited 






edge 







Figure 8 Even single edge miscalculations allow ray-casting, 
pixel-filling and grid-filling to spread outside of the object 
projection. 

We propose the use of barriers in ray-casting- and grid- 
filling-based techniques to mitigate the impact of edge 
miscalculations in object projection feature estimation. 
When using barriers, a barrier perpendicular to the vec- 
tor determined by the source and target pixel positions 
is computed. Instead of checking a single pixel in order 
to progress, the whole barrier of pixels is checked, and if 
any of the barrier pixels are edge pixels, the target pixel 
is discarded, as seen in Figure [S] 

Since ray-casting and grid-filling techniques usually 
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Figure 9 A simple barrier strategy avoids ray-casting, pixel- 
filling and grid-filling to spread outside of the object projec- 
tion throught edge miscalculations. 



reach each zone of the object projection from different 
pixels (i.e. with different vectors), spurious edge pixels 
would cause different barriers to be considered, therefore 
they would not negatively affect the results. 

The size of the barrier can be adjusted to any odd 
number. A minimal barrier of size 3 makes the techniques 
insensitive to edge miscalculations of up to 2 pixels wide. 
A barrier of size 5 makes the techniques insensitive to 
edge miscalculations of up to 4 pixels wide. 

The implementation of barriers in existing techniques 
only increases the constant multiplicative factor of the al- 
gorithm, and it greatly improves the results of ray-casting 
and grid-filling object projection feature estimation tech- 
niques. 



IV. CONCLUSIONS AND FUTURE WORK 

In this paper, we have studied the object projection 
feature estimation problem in 3D motion tracking and 
we have studied the existing ray-casting-based and grid- 
casting-based techniques for solving it. 

We have proposed the use of barriers during the cast- 
ing, which makes the existing techniques less sensitive to 
edge miscalculations. 

Our proposal allows the development of more accurate 
unsupervised markerless 3D motion tracking systems. 

Also, as our proposal reduces the error on the feature 
estimations, and these errors could cause tracking errors, 
it allows the development of more robust unsupervised 
markerless 3D motion tracking systems. 

We plan to keep reducing the impact of edge miscal- 
culations in the feature estimations by using color-space 
information. 
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